{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "387e2968-3bfd-48c6-a925-d315f4566623",
   "metadata": {},
   "source": [
    "# Instant Gratification\n",
    "## Your first Frontier LLM Project!\n",
    "Using **Gemini API** to summarise transcripts from class videos. <br>\n",
    "Tested with: *day_1_first_llm_experiment_summarization_project* transcript video. \n",
    "## [Test_video](https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/learn/lecture/46867741#questions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9540582d-8d2a-4c14-b117-850823b634a0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "import os, sys\n",
    "import google.generativeai as genai\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import HTML, Markdown, display"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fe0d366-b183-415c-b6e1-4993afd82f2a",
   "metadata": {},
   "source": [
    "# Connecting to Gemini API\n",
    "\n",
    "The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89c1194c-715b-41ff-8cb7-6b6067c83ea5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load environment variables in a file called .env\n",
    "load_dotenv()\n",
    "api_key = os.getenv('GOOGLE_API_KEY')\n",
    "\n",
    "# Check the key\n",
    "if not api_key:\n",
    "    print(\"No API key was found!\")\n",
    "else:\n",
    "    print(\"Great! API key found and looks good so far!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bc036e2-54c1-4206-a386-371a9705b190",
   "metadata": {},
   "source": [
    "# Upload Daily or Weekly Transcriptions\n",
    "If you have text files corresponding to your video transcripts, upload them by day or week. With the help of Cutting-edge LLM models, you will get accurate summaries, highlighting key topics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fcf4b72-49c9-49cd-8b1c-b5a4df38edf7",
   "metadata": {},
   "source": [
    "## Read data from txt files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00466898-68d9-43f7-b8d3-7d61696061de",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "```\n",
    "# Read the entire file using read() function\n",
    "file = open(\"../day_1_first_llm_experiment_summarization_project.txt\", \"r\") # Your file path\n",
    "file_content = file.read()\n",
    "text = file_content\n",
    "file.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37d15b83-786d-40e9-b730-4f654e8bec1e",
   "metadata": {},
   "source": [
    "## Types of prompts\n",
    "\n",
    "You may know this already - but if not, you will get very familiar with it!\n",
    "\n",
    "Models like GPT4o have been trained to receive instructions in a particular way.\n",
    "\n",
    "They expect to receive:\n",
    "\n",
    "**A system prompt** that tells them what task they are performing and what tone they should use\n",
    "\n",
    "**A user prompt** -- the conversation starter that they should reply to\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "846c63a4-14e0-4a3c-99ce-654a6928dc20",
   "metadata": {},
   "source": [
    "### For this example, we will directly input the text file into the prompt."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6ef537b-a660-44e3-a0c2-94f3b9e60b11",
   "metadata": {},
   "source": [
    "## Messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1e96271-593a-4e16-bb17-81c834a59178",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_message = \"You are an assistant that analyzes the contents of text files \\\n",
    "and provides an accurate summary, ignoring text that might be irrelevant. \\\n",
    "Respond in markdown.\"\n",
    "\n",
    "#user_prompt = file_content Use if you load your data\n",
    "user_prompt = \"\"\"\n",
    "It's time for our first LM experiment at this point.\n",
    "So some of this you may know well, you may know very well already.\n",
    "For some people this might be new, but let me just explain.\n",
    "The models that we're going to be using.\n",
    "These frontier models have been trained in a particular way.\n",
    "That means that they expect two different types of instruction from us the user.\n",
    "One of them is known as the system prompt, and one of them is known as the user prompt.\n",
    "The system prompt is something which explains the context of this conversation.\n",
    "It tells them what kind of task they're performing, what tone they should use, and we'll be experimenting\n",
    "with what it means to to change a system prompt and what kind of information that you can include in\n",
    "the system prompt throughout this course.\n",
    "The user prompt is the actual conversation itself.\n",
    "And in our case right now, it's going to just be the the conversation starter.\n",
    "And the role of the LM of the large language model is to figure out what is the most likely way that\n",
    "it should respond, given this user prompt.\n",
    "If it's given this user prompt, and in the context of this system prompt, what is the most likely\n",
    "next text that will come after it?\n",
    "That would come from an assistant responding to this user.\n",
    "So that's the difference between the system prompt that sets the context, the user prompt that is the\n",
    "conversation starter.\n",
    "So we're going to set a system prompt.\n",
    "And this is what it's going to say.\n",
    "It's going to say you are an assistant that analyzes the contents of a website and provides a short\n",
    "summary, ignoring texts that might be navigation related.\n",
    "Respond in markdown.\n",
    "You'll see more of what that means in in just a second.\n",
    "So that is our system prompt for the user prompt.\n",
    "It's going to take as a we're going to write a function user prompt for.\n",
    "And it's going to take a website as the argument to the function.\n",
    "And it's going to say you are looking at a website titled The Website.\n",
    "The contents of this website is as follows.\n",
    "Please provide a short summary of the website in markdown if it includes news or announcements.\n",
    "Summarize these two and we then take the text from the website object that Beautifulsoup plucked out\n",
    "for us, and we add that into the user prompt and we return that user prompt.\n",
    "So let's just quickly let's run that cell right now and let's just have a look now.\n",
    "So after doing that, if I just look at what system Prompt has.\n",
    "It has that text of course that we just said.\n",
    "And now if you remember earlier on we created a new website object and we stored it in this variable\n",
    "editor.\n",
    "So if I come here I should be able to say user prompt for and then pass in the object Ed.\n",
    "And what we'll get is a prompt.\n",
    "It might be easier if I print this so that it prints out empty lines.\n",
    "And here is the user prompt string that we've created.\n",
    "It says you're looking at a website titled blah blah blah.\n",
    "The contents of this website is as follows.\n",
    "Please provide a short summary.\n",
    "Look, it looks like we should have a space right here, otherwise it might be confusing.\n",
    "Let's try that again.\n",
    "That's always why it's worth printing things as you go, because you'll spot little inconsistencies\n",
    "like that.\n",
    "I think it'll be nicer, actually, now that I look at that.\n",
    "If we have a carriage return there like so.\n",
    "Let's have a look at this prompt.\n",
    "Now you're looking at the website and there we go on a separate line that looks good okay.\n",
    "So let's talk about the messages object.\n",
    "So OpenAI expects to receive a conversation in a particular format.\n",
    "It's a format that OpenAI came up with and they used for their APIs, and it became so well used that\n",
    "all of the other major frontier models decided to adopt the same convention.\n",
    "So this has gone from being originally OpenAI's way of using the API to being something of a standard\n",
    "across many different models to use this approach.\n",
    "And here's how it works.\n",
    "When you're trying to describe a conversation, you describe it using a list a Python list of dictionaries.\n",
    "So it's a list where each element in the list is a dictionary.\n",
    "And that dictionary looks like this.\n",
    "It's a dictionary with two elements.\n",
    "One of them has a key of role, and here the value is either system or user, a key of role.\n",
    "And the value is system a key of content.\n",
    "And the value is of course the system message.\n",
    "There's another Dictionary where there's a key of role.\n",
    "The value is user because it's the user message.\n",
    "The user prompt content is where the user message goes.\n",
    "User message and user prompt are the same thing.\n",
    "So hopefully I didn't explain it very well, but it makes sense when you see it visually like this.\n",
    "It's just a dictionary which has role and content, system and system, message user and the user message.\n",
    "And there are some other roles as well, but we're going to get to them in good time.\n",
    "This is all we need for now.\n",
    "So this is how messages are built.\n",
    "And if you look at this next function def messages for hopefully it's super clear to you that this is\n",
    "creating.\n",
    "This here is creating exactly this construct using code.\n",
    "It's going to do it's going to put in there the generic system prompt we came up with.\n",
    "And it's going to create the user prompt for the website.\n",
    "So let's run that.\n",
    "And now, presumably it's clear that if I say messages for Ed, which is the object for my website,\n",
    "let's print it so that we see empty lines and stuff.\n",
    "Actually, sorry, in this case it might be better if we don't print it.\n",
    "If we just do this, it might look a bit clearer.\n",
    "There we go.\n",
    "And now you can see that it is it's a list of two things role system.\n",
    "And there's a system message role user.\n",
    "And there is the user message.\n",
    "Okay.\n",
    "It's time to bring this together.\n",
    "It's time to actually do it.\n",
    "The API for OpenAI to make a call to a frontier model to do this for us is super simple, and we're\n",
    "going to be using this API all the time.\n",
    "So whereas now it might look like it's a few things to remember.\n",
    "You're going to get so used to this, but we're going to make a function called summarize.\n",
    "And that is that's going to do the business that's going to solve our problem and summarize a URL that's\n",
    "passed in.\n",
    "It will first create a website for that URL, just like we did for editor.\n",
    "And this is where we call OpenAI.\n",
    "We say OpenAI, which is the the OpenAI object.\n",
    "We created OpenAI dot chat, dot completions, dot create.\n",
    "And that for now you can just learn it by rote.\n",
    "We'll understand a lot more about that later.\n",
    "But as far as OpenAI is concerned, this is known as the completions API because we're asking it to\n",
    "complete this conversation, predict what would be most likely to come next.\n",
    "We pass in the name of the model we're going to use.\n",
    "We're going to use a model called GPT four mini that you'll get very familiar with.\n",
    "It is the light, cheap version of GPT four, the the one of the finest models on the planet, and this\n",
    "will cost fractions of a cent to use.\n",
    "This, um, you pass in the model and then you pass in the messages and the messages we pass in, use\n",
    "this structure that we've just created and that is all it takes.\n",
    "What comes back we put in this this object response.\n",
    "And when we get back the response we call response dot choices zero dot message dot content.\n",
    "Now I'm going to explain what this is another day we don't need to know.\n",
    "For now.\n",
    "We just need to know that we're going to do response dot choices zero dot message dot content.\n",
    "That's going to be it.\n",
    "That is our summarize function.\n",
    "And with that let's try summarizing my website we're running.\n",
    "It's now connecting to OpenAI in the cloud.\n",
    "It's making the call and back.\n",
    "Here is a summary of my website.\n",
    "We have just uh, spent a fraction of a cent and we have just summarized my website.\n",
    "We can do a little bit better because we can print this in a nice style.\n",
    "Uh, GPT four, we've asked to respond in markdown, and that means that it's responded with various\n",
    "characters to represent headings, things in bold and so on.\n",
    "And we can use a feature of Jupyter Labs that we can ask it to actually show that in a nice markdown\n",
    "format.\n",
    "So let's do that.\n",
    "Let's use this display summary function and try again.\n",
    "Again we're going to GPT for a mini in the cloud.\n",
    "And here is a summary of my website.\n",
    "Uh, it says something about me.\n",
    "Uh, and it's uh yeah, very nicely formatted, very nicely structured.\n",
    "Pretty impressive.\n",
    "And apparently it highlights my work with proprietary LMS, offers resources related to AI and LMS,\n",
    "showcasing his commitment to advancing knowledge in this field.\n",
    "Good for you, GPT for mini.\n",
    "That's a very nice summary.\n",
    "Okay.\n",
    "And now we can try some more websites.\n",
    "Let's try summarizing cnn.com.\n",
    "Uh, we'll see what this happens.\n",
    "Obviously, CNN is a much bigger, uh, result you've got here.\n",
    "Uh, and, uh, we get some information about what's going on.\n",
    "I'm actually recording this right now on the 5th of November at, uh, in the evening, which is the\n",
    "date of the 2024 elections going on right now.\n",
    "So that, of course, is featured on CNN's web page.\n",
    "We can also summarize anthropic, which is the website for Claude.\n",
    "And they have a nice page.\n",
    "And here you go.\n",
    "And you can read more about it in this nice little summary of their web page.\n",
    "All right.\n",
    "And that wraps up our first instant gratification.\n",
    "It's it's juicy.\n",
    "It's something where we've actually done something useful.\n",
    "We've scraped the web.\n",
    "We've summarized summarization is one of the most common AI use cases.\n",
    "So common it's useful for all sorts of purposes.\n",
    "We'll be doing it a few different ways during during this course, even in our week eight a sticky solution\n",
    "will be using something that will do some summarization.\n",
    "So it's a great, uh, thing to have experimented with already.\n",
    "So there are so many other business applications of summarization.\n",
    "This is something you should be able to put to good use.\n",
    "You should be able to think of some ways you could apply this to your day job right away, or be building\n",
    "a couple of example projects in GitHub that show summarization in action.\n",
    "You could apply it to summarizing the news, summarizing financial performance from a financial report,\n",
    "a resume, and a cover letter.\n",
    "You could you could take a resume and generate a cover letter.\n",
    "Uh, there are so many different things you can do with summarization of of documents.\n",
    "And also adding on to that the scraping the web angle of it.\n",
    "So have a think about how you would apply summarization to your business and try extending this to do\n",
    "some summarization.\n",
    "There's also uh, for for the more technically inclined, uh, one of the things that you'll discover\n",
    "quite quickly when you use this is that there are many websites that cannot be summarized with this\n",
    "approach, and that's because they use JavaScript to render the web page and are rather simplistic.\n",
    "Approach has just taken the the just just made the requests the server call and taken what we get back.\n",
    "But there's a solution.\n",
    "And the solution is to use a platform like selenium or others like it, or playwright, which would\n",
    "allow you to render the page and and do it that way.\n",
    "So if you're technically inclined and have some background with that kind of thing, then a really interesting\n",
    "challenge is to turn this into something that's a bit beefier and add selenium to the mix.\n",
    "Um, as it happens, someone has already done that.\n",
    "Uh, one of the students, thank you very much.\n",
    "And if you go into this folder community contributions, you'll see a few different solutions.\n",
    "And one of them is a selenium based solution.\n",
    "So you can always go in and just just look at that yourself.\n",
    "Or you can have a shot at doing it too.\n",
    "And you'll find the solution in there.\n",
    "And if you do come up with a solution to that or to anything, I would love it if you were willing to\n",
    "share your code so that others can benefit from it.\n",
    "Ideally, put it in the community contributions folder and be sure to clear the output.\n",
    "So you go to kernel restart kernel and clear outputs of all cells.\n",
    "Otherwise, everything that you've got in your output would also get checked into code which which would\n",
    "just clutter things up a bit.\n",
    "So so do that.\n",
    "And then if you could submit a PR, a pull request, I can then merge that into the code.\n",
    "And if that's a new thing for you, it is a bit of a process.\n",
    "There is a write up here for exactly what you need to do to make that work.\n",
    "Anyways, this was the first project, the first of many.\n",
    "It's a simple project, but it's an important one.\n",
    "A very important business use case.\n",
    "I hope you found it worthwhile.\n",
    "I will see you for the next video when we wrap up.\n",
    "Week one.\n",
    "Day one.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab80b7dd-4b07-4460-9bdd-90bb6ba9e285",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompts = [\n",
    "    {\"role\": \"system\", \"content\": system_message},\n",
    "    {\"role\": \"user\", \"content\": user_prompt}\n",
    "  ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a04ec8b-4d44-4a90-9d84-34fbf757bbe4",
   "metadata": {},
   "source": [
    "## The structure to connect with Gemini API was taken from this contribution. \n",
    "### [From this notebook](https://github.com/ed-donner/llm_engineering/blob/main/week2/community-contributions/day1-with-3way.ipynb)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3aecf1b4-786c-4834-8cae-0a2758ea3edd",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# The API for Gemini - Structure\n",
    "genai.configure(api_key=api_key)\n",
    "\n",
    "gemini = genai.GenerativeModel(\n",
    "    model_name='gemini-1.5-flash',\n",
    "    system_instruction=system_message\n",
    ")\n",
    "response = gemini.generate_content(user_prompt)\n",
    "response = response.text\n",
    "# response = str(response.text) Convert to string in order to save the response as text file\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf9a97d9-d935-40de-9736-e566a26dff25",
   "metadata": {},
   "source": [
    "## To save the processed text data as a file, utilize the following code:\n",
    "\n",
    "```\n",
    "# This is a common pattern for writing text to a file in Python, \n",
    "with open('data_transcript/pro_summary.txt', 'w') as fp:\n",
    "    fp.write(response)\n",
    "    fp.close()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a36a01fa-5718-4bee-bb1b-ad742ab86d6a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Markdown(response.text) If you convert the data type of the variable \"response\" to a string\n",
    "Markdown(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7c53213-838c-4b67-8e99-1fd020b3508d",
   "metadata": {},
   "source": [
    "summarize(\"https://edwarddonner.com\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcddef17-9487-4800-8b04-c12ee2a58925",
   "metadata": {},
   "source": [
    "display_summary(\"https://edwarddonner.com\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe959162-2c24-4077-b273-ea924e568731",
   "metadata": {},
   "source": [
    "# Key Benefits of AI Summarization:\n",
    "\n",
    "__Time-Saving:__ Quickly process large volumes of text, such as research papers, reports, and news articles. <br>\n",
    "__Improved Comprehension:__ Identify key points and insights more efficiently. <br>\n",
    "__Enhanced Decision-Making:__ Make informed decisions based on accurate and concise information. <br>\n",
    "__Cost Reduction:__ Reduce labor costs associated with manual summarization tasks. <br>\n",
    "\n",
    "# Potential Applications in Business Development:\n",
    "\n",
    "__Market Research:__ Quickly analyze market reports and competitor insights to identify trends and opportunities. <br>\n",
    "__Sales and Marketing:__ Summarize customer feedback and product reviews to inform marketing strategies. <br>\n",
    "__Customer Support:__ Quickly process customer inquiries and provide accurate answers. <br>\n",
    "__Legal and Compliance:__ Analyze legal documents and contracts to identify key clauses and potential risks. <br>\n",
    "__Human Resources:__ Summarize job applications and performance reviews to streamline hiring and evaluation processes."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm",
   "language": "python",
   "name": "llm"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
